Document representation with Generalized Latent Semantic Analysis
نویسندگان
چکیده
Methods for dimensionality reduction, notably LSA, have been successfully applied to the information retrieval task and document classification. Recently, corpus-based association measures such as point-wise mutual information have been found to outperform LSA on a variety of tasks. We have developed an algorithmic framework that computes a low-dimensional vector space representation of documents combining different measures of association with different dimensionality reduction techniques. Experimental results show a competitive performance on the synonymy and text classification tasks.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملGraph-Based Generalized Latent Semantic Analysis For Document Representation
Document indexing and representation of term-document relations are very important for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based association measure within the Generalized Latent Semantic Analysis framework. We evaluate the graph-based GLSA on the document clustering task.
متن کاملTerm Representation with Generalized Latent Semantic Analysis
Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by the recent success of co-occurrence based measures of semantic similarity obtained fr...
متن کاملTerms and Document Representation with Generalized Latent Semantic Analysis
Document indexing and representation of term-document relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by recent success of co-occurrence based measures of semantic similarity obtained from ...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملComputing Term Translation Probabilities with Generalized Latent Semantic Analysis
Term translation probabilities proved an effective method of semantic smoothing in the language modelling approach to information retrieval tasks. In this paper, we use Generalized Latent Semantic Analysis to compute semantically motivated term and document vectors. The normalized cosine similarity between the term vectors is used as term translation probability in the language modelling framew...
متن کامل